huggingface / trl Public

generated from fastai/nbdev_template

Notifications You must be signed in to change notification settings
Fork 1.9k
Star 14k

Code
Issues 393
Pull requests 79
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: huggingface/trl

[Tracking issue] Integrate native liger-kernel losses

#2495 opened Dec 17, 2024 by qgallouedec

Open 6

[Tracking issue] Wrong loss scaling when accumulating gradient

#2617 opened Jan 23, 2025 by qgallouedec

Open

Beta

Labels 32 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

393 Open 1,416 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

LORA Continuos pre-training on 7B Instruct Model ✨ enhancement

New feature or request

⚡ PEFT

Related to PEFT

🏋 SFT

Related to SFT

#3509 opened May 29, 2025 by sinchanabhat

High KL Divergence in GRPO with GPT2-Style Model (Due to Dropout?) 🏋 GKD

Related to GKD

🏋 GRPO

Related to GRPO

🏋 SFT

Related to SFT

#3500 opened May 27, 2025 by cliang-huanglab

Converting a conversational dataset into a standard dataset [not working] 🐛 bug

Something isn't working

#3490 opened May 23, 2025 by nbasyl

5 tasks done

Completions Only Loss is incompatible with use_liger_kernel set as true 🐛 bug

Something isn't working

🏋 SFT

Related to SFT

#3484 opened May 22, 2025 by arashpreetsinghmor

Vision Fine Tuning Gemma 3 takes Impossiblily High VRam (OOM Error 8xH200) ⚡accelerate

Related to accelerate

🐛 bug

Something isn't working

⚡ PEFT

Related to PEFT

#3481 opened May 22, 2025 by amanmehra89

5 tasks done

【GRPO】Why are some batches of prompts not involved in training?

#3477 opened May 22, 2025 by moguizhizi

5 tasks done

Is it possible to make prompts dynamic (or iterable datasets) in GRPO training? ✨ enhancement

New feature or request

🏋 GRPO

Related to GRPO

#3474 opened May 21, 2025 by onlyjokers

[GPG][new trainer] Add support to new GPG method ✨ enhancement

New feature or request

#3472 opened May 20, 2025 by lerogo

3 tasks done

[GRPO] bnb quantization + vllm 🐛 bug

Something isn't working

🏋 GRPO

Related to GRPO

⚡ PEFT

Related to PEFT

#3466 opened May 18, 2025 by shon-otmazgin-wix

5 tasks done

PPO Training does not improve SFT model outputs (Metrics identical before and after PPO) 🏋 PPO

Related to PPO

🏋 SFT

Related to SFT

#3464 opened May 18, 2025 by xmriz

Turn off Accelerate acceleration ⚡accelerate

Related to accelerate

🏋 GRPO

Related to GRPO

#3461 opened May 17, 2025 by seTalent

Out of Memory when GRPO fine-tune Qwen3 4B model on 80G A100 GPU 🐛 bug

Something isn't working

🏋 GRPO

Related to GRPO

#3456 opened May 16, 2025 by wa008

5 tasks done

Reward_model error in cli GRPO train when I only use reward function rather than reward_model 🐛 bug

Something isn't working

📱 cli

Related to the Command-line interface

🏋 GRPO

Related to GRPO

🏋 Reward

Related to Reward modelling

#3455 opened May 16, 2025 by wa008

5 tasks done

PPO training fails when used with accelerate ⚡️ and Deepspeed 🚀 ⚡accelerate

Related to accelerate

🚀 deepspeed

Related to deepspeed

🏋 PPO

Related to PPO

🏋 SFT

Related to SFT

#3453 opened May 16, 2025 by marcellobullo

5 tasks done

GRPO reward=0 and loss=0 🏋 GRPO

Related to GRPO

🏋 Reward

Related to Reward modelling

#3452 opened May 15, 2025 by LIUyizheSDU

torch distributed training with multi gpus errors in GRPOtrainer 🐛 bug

Something isn't working

🏋 GRPO

Related to GRPO

#3451 opened May 15, 2025 by jinhonglu

5 tasks done

trl vllm-serve not working on latest. 🐛 bug

Something isn't working

🏋 GRPO

Related to GRPO

#3450 opened May 15, 2025 by tcapelle

5 tasks done

[GRPO] num_generations 🏋 GRPO

Related to GRPO

❓ question

Seeking clarification or more information

#3443 opened May 13, 2025 by shon-otmazgin-wix

5 tasks done

Unstructured data grpo training 🐛 bug

Something isn't working

🏋 GRPO

Related to GRPO

#3441 opened May 13, 2025 by yuyuhua918

FSDP SFT deepseekv2-prover7B OOM on 8*RTX3080

#3436 opened May 12, 2025 by LIUyizheSDU

[GRPO] NCCL timeout when GRPO training with vllm, but get "ValueError: The decoder prompt (length 5951) is longer than the maximum model length of 4096. Make sure that max_model_len is no smaller than the number of text tokens. "

#3433 opened May 11, 2025 by qyr0403

5 tasks done

The performance has deteriorated significantly after fine-tuning with TRL but increase while using llama-factory

#3432 opened May 11, 2025 by Hasuer

Add support for asynchronous reward functions in GRPOTrainer (and maybe other trainers)

#3426 opened May 8, 2025 by ideechy

[GRPO] How to train model using vLLM and model parallelism on one node?

#3424 opened May 8, 2025 by zhiqihuang

[Community Discussion] Progressive Tasks Datasets with Verification for Agentic RL 🏋 Reward

Related to Reward modelling

#3417 opened May 6, 2025 by August-murr

Previous 1 2 3 4 5 … 15 16 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!